The Compositional Interpretation of Nominal Compounds
نویسنده
چکیده
The analysis of nominal compound constructions has proven to be a recalcitrant problem for linguistic semantics and poses serious challenges for natural language processing systems. We argue for a compositional treatment of compound constructions which limits the need for listing of compounds in the lexicon. We argue that the development of a practical model of compound interpretation crucially depends on issues of lexicon design. The Generative Lexicon (Pustejovsky 1995) provides us with a model of the lexicon which couples su ciently expressive lexical semantic representations with mechanisms which capture the relationship between those representations and their syntactic expression. In our approach, the qualia structures of the nouns in a compound provide relational structure enabling compositional interpretation of the modi cation of the head noun by the modifying noun. This brings compound interpretation under the same rubric as other forms of composition in natural language, including argument selection, adjectival modi cation, and type coercion. We examine data from both English and Italian and develop analyses for both languages which use phrase structure schemata to account for the connections between lexical semantic representation and syntactic expression. In addition to applications in natural language understanding, machine translation, and generation, the model of compound interpretation developed here can be applied to multi-lingual information extraction tasks. 2 MICHAEL JOHNSTON AND FEDERICA BUSA 1. The Semantic Interpretation of Noun-Noun Compounds Complex nominals, and in particular noun-noun compounds, have received a great deal of attention in linguistic research, (Bergsten 1911, Jespersen 1942, Marchand 1970, Lees 1970, Downing 1977, Levi 1978, Warren 1987). The problems that they pose can hardly be ignored in computational linguistics, where their analysis has presented a serious challenge for natural language processing systems (Finin 1980, McDonald 1982, Isabelle 1984, Alshawi 1987, Hobbs et al. 1993, Bouillon et al. 1992, Jones 1995, Johnston, Boguraev, and Pustejovsky 1995, Copestake and Lascarides 1997). Among the most pressing concerns is the de nition of a systematic method for recovering or predicting the implicit semantic relation that holds between a head noun and a modi er noun. For instance, the compound bread knife refers to a knife which is used to cut bread, while lemon juice refers to juice which is obtained from squeezing lemons. Previous analyses were based on essentially descriptive methodologies (Levi 1978, Jones 1995), relying on the enumeration of possible semantic relations observed in the language. The limitation of these approaches is that they can hardly be exhaustive, since they rely on uniquely descriptive machinery. The suggestion presented in this paper aims at assimilating the composition of complex nominals to other compositional processes of natural language. To this end, we argue that composition in compound constructions involves speci cation of the arguments of predicate structures within the qualia of the head noun. In essence, the qualia structure provides the `glue' which links together the semantic contributions of modifying nouns and the head noun in the compound. The predicates in the qualia are not there just to account for compounds but also to account for a wide variety of forms of composition and interpretation including argument selection, adjectival modi cation, (cf., Bouillon, 1995), type coercion (cf., Pustejovsky, 1991,1995), and nominalization, (cf., Busa, 1996). We examine data from both English and Italian and contrast the di ering strategies for compound formation in the two languages. Examination of both languages in parallel is important since information regarding the semantic relationship between the elements of a compound which is implicit in English is marked syntactically in Italian. In order to account for the availability of compound constructions in both English and Italian, we utilize phrase structure schemata that capture the di erent ways in which head nouns and modifying nouns can compose. 1.1. A CROSS-LINGUISTIC APPROACH The comparison of English and Italian data yields very interesting results. One the one hand it provides very strong evidence in favor of a qualiaThe Compositional Interpretation of Nominal Compounds 3 based semantics, on the other it reveals general properties of the internal structure of compound nominals which have important consequences for multilingual language processing. Consider the correspondences in (1), below: (1) a. bread knife b. wine glass c. bullet hole coltello da pane bicchiere da vino foro di pallottola d. lemon juice e. glass door f. silicon breast succo di limone porta a vetri seni al silicone The English noun-noun compounds in (1), above, consist of a modifying noun followed by the head noun, and the semantic relation that holds between them is unspeci ed and left implicit. In Italian, instead, the preposition that intervenes between the head and the modifying noun indicates the nature of the semantic relation holding between the two nouns.1 Consider rst the forms in (1a) and (1b), where the modifying noun provides information regarding the purpose or function of object described by the head noun. Thus, in (1a), bread speci es the object which the knife is typically used to cut; in (1b), wine speci es the substance that the glass is used to hold. When the modi er speci es some aspect of the purpose of the head, the preposition in the Italian form generally is da. In (1c) and (1d), the modi er relates to the origin of the object described by the head noun, how it was brought about. A bullet hole is a hole which was brought about by the passage of a bullet, and lemon juice is juice that is brought about by squeezing a lemon. When the modi er speci es some aspect of the origin of the head, the appropriate preposition for the Italian form appears to be di. In (1e) and (1f), the modi er relates to the constitution of the object described by the head noun, what it is made of. A glass door is a door made of glass, while a silicon breast is a breast, at least partially, composed of silicon. For forms in which the modi er speci es the constitution of the head, the appropriate preposition in Italian is a. While English compounds can easily be described in terms of sequences of adjacent nouns, the syntactic constituency of Italian compounds is less clear. We turn now to examine the structural di erences in the syntactic expression of compounds in English and Italian. 1In Italian most of the compounds that are possible involve an underived verb (e.g. portadocumenti (document holder), segnalibro (bookmark), spaventapasseri (scarecrow), etc.). There are, however, a few other noun-noun compounds such as trenomerci (freight train), vacanza-studio (holiday-study), but these are infrequent and not our concern here. See Beard(1996) for an interesting account of why Italian favors the type of constructions illustrated in (1) above. 4 MICHAEL JOHNSTON AND FEDERICA BUSA 1.2. STRUCTURAL DIFFERENCES IN ENGLISH AND ITALIAN In an earlier version of this work, (cf. Busa and Johnston, 1995), we focussed on the role of the Italian preposition for disambiguation purposes and did not provide a further analysis of the structural properties of complex nominals. Upon closer scrutiny, however, there is an important question which should be answered, namely whether or not the postmodi er is a PP, and thus has internal syntactic structure and allows recursion. Consider the examples in (1), some of which we repeat below for convenience: (3) a. coltello da pane b. bicchiere da vino c. porta a vetri bread knife wine glass glass door If the modi ers in (3) are PPs, then we should be able to introduce additional material between the head noun and the modi er. The expressions in (4)-(6), below, illustrate NPs where the head noun is modi ed by an adjunct, and an adjective can intervene between the head noun and the PP: (4) a. coltello sul frigo b. coltello tagliente sul frigo knife on the fridge sharp knife on the fridge (5) a. bicchiere nel lavandino b. bicchiere sbeccato nel lavandino glass in the sink chipped glass in the sink (6) a. porta girevole b. porta girevole sul retro revolving door revolving glass door The compounds in (3), however, behave di erently. As show in (7) and (8), an adjective may not intervene between the head noun and its modi er, (cf., (7a) and (8a)), but is only licensed if it modi es the whole compound The Compositional Interpretation of Nominal Compounds 5 as shown in (7b) and (8b), below: (7) a. *coltello tagliente da pane \knife sharp for bread" b. [coltello da pane] tagliente \[knife for bread] sharp" sharp [bread knife] (8) a. *porta girevole a vetri \door revolving with glass" b. [porta a vetri] girevole \[door of glass] revolving" revolving [glass door] In both languages an adjectival modi er of the head is not possible, but an adjectival modi er of the whole compound is possible.2 In cases where there is recursion within the compound, the modi er must itself be a compound, as shown in (9) and (10) below: (9) a. forno a microonde b. [piatto da [forno a microonde]] microwave oven microwave oven dish (10) a. fabbrica di cioccolato b. [fabbrica di [cioccolato da dolci]] chocolate factory cake chocolate factory An additional observation concerns the status of certain adjectival forms, which, we argue are not evidence for the PP hypothesis, since, again no additional material may appear in the compound. This is shown in the contrast between white chocolate and sweet chocolate: (11) a. cioccolato bianco white chocolate b. *cioccolato molto bianco very white chocolate 2Interestingly, glass revolving door would appear to be an exception, since it is wellformed in English. The reason for this is that glass door and revolving door are both compound forms. Thus, we have two options: one where revolving modi es glass door; the other where glass modi es revolving door. 6 MICHAEL JOHNSTON AND FEDERICA BUSA (12) a. cioccolato dolce sweet chocolate b. cioccolato molto dolce very sweet chocolate The expression in (11a) is a compound, and thus the modi er very is not licenced. The expression in (12), instead, is an instance of adjectival modi cation of a head noun and thus allows recursion. These distinctions are re ected below, where neither (11b) nor (12a) are licensed in a compound structure: (13) a. [torta al [cioccolato bianco]] white chocolate cake b. *[torta al [cioccolato molto bianco]] very white chocolate cake c. *[torta al [cioccolato dolce]] sweet chocolate cake An analysis in which the Italian equivalents of English nominal compounds are treated as instances of prepositional phrase modi cation will fail to account for the data described above. In our analysis, we treat these elements not as syntactic elements in their own right but rather as bound elements of the Italian nominal compounding construction. In a sense, they can be thought of as bound morphemes surrounded by white space; in xes which appear between the nouns in a nominal compound. Both English and Italian compounds will be treated as instantiations of compound construction schemata. Our analysis of compounds in English and complex nominals in Italian utilizes the representational framework of the Generative Lexicon (GL) (Pustejovsky 1991,1995) for lexical semantic representation. Before turning to a more detail explanation of our account, the next section provides a brief sketch of this approach. The GL lexical representation is embedded within a simpli ed feature-based syntactic representation of syntax and semantics similar in some aspects to Head-driven Phrase Structure Grammar (Pollard and Sag 1987,1994). 2. The Generative Lexicon and Lexical Representation For the purposes of this paper, the lexical semantic representation we employ is a simpli ed form of the Generative Lexicon (GL) representation employed in Pustejovsky (1995). Our lexical semantic entries include four The Compositional Interpretation of Nominal Compounds 7 levels of representation: type structure, argument structure, event structure, and qualia structure. The latter in turn expresses four aspects of the meaning of the lexical item: formal, constitutive, telic, and agentive. These lexical entries are encoded using typed feature structures (Carpenter 1992, Pollard and Sag 1994). The basic layout of the lexical semantic representations we employ is given in (14). (14) 2 66666666664 typestr = h arg1 = the type of i argstr = h d-arg1 = other arguments in the qualia i eventstr = h e1 = events in the qualia i qualia = 2 64 formal = isa-relation constitutive = parts of telic = purpose of agentive = how is brought about 3 75 3 77777777775 Given this model of lexical representation a noun such as knife has the entry in (15). The predicates in the qualia specify the de nitional properties of knife. Participants in these predicates other than the knife itself are listed as default arguments (d-arg1, d-arg2, and d-arg3) in argstr. Events are listed as default events (d-e1, d-e2), which means that for nominals they are already quanti ed in the lexicon, (cf., Busa, 1996 for motivation). (15) 2 6666666666666664 knife typestr = h arg1 = x artifact tool i argstr = 2 4 d-arg1 = y physobj d-arg2 = w human d-arg3 = z human 3 5 evenstr = " d-e1 = e1 transition d-e2 = e2 process # qualia = 2 664 formal = x constitutive = fblade,handle,...g telic = cut act( e2 , w , x , y ) agentive = make act( e1 , z , x ) 3 775 3 7777777777777775 Representations such as that in (15) are intended to present the semantic content of a particular lexical item or compound. In order to represent the composition of compounds, these representations are embedded within a lexical representation which also includes orthographic and syntactic infor8 MICHAEL JOHNSTON AND FEDERICA BUSA mation. (16) 2 664 ORTH = ::: CAT = ::: CONTENT = ::: DTRS = HEAD = ::: MOD = ::: 3 775 The orth feature contains the orthographic form of the word or compound. The cat feature contains syntactic category information. content contains the semantic representation, such as that in (15). The dtrs feature is found only in composite forms and it speci es the lexical representation of the head and modifier of which the entry is composed. This upper layer of lexical representation is not intended as a comprehensive or theoretically signi cant proposal for lexical representation, it simply serves as a generic skeleton to house the lexical semantic representation, which is out real concern here. For the sake of brevity we will continue to use the semantic representation alone, as in (15), when the higher level lexical structure is not relevant. In the next section, we show how the three classes of compounds considered so far can be treated as instances of telic, agentive, and constitutive qualia modi cation respectively. 3. Qualia Modi cation 3.1. TELIC QUALIA MODIFICATION In order to illustrate our approach, we will start with examples such as bread knife (1a), in which the modifying noun relates to the purpose of the head noun. The preferred interpretation of this compound is that it is a knife which is used to cut bread. The fact that a knife is an object whose inherent purpose is to cut things is encoded by the predicate cut act in the telic role (cf., (15), above). The function of the modi er bread is to specify the third argument of the cut act relation. The feature structure associated with bread knife will be as in (17). The rst default argument d-arg1 has been specialized from physobj to bread and this value is structure-shared The Compositional Interpretation of Nominal Compounds 9 with the third argument position in the cut act predicate. (17) 2 6666666666666664 bread knife typestr = h arg1 = x artifact tool i argstr = 2 4 d-arg1 = y bread d-arg2 = w human d-arg3 = z human 3 5 evenstr = " d-e1 = e1 transition d-e2 = e2 process # qualia = 2 664 formal = x constitutive = fblade,handle,...g telic = cut act( e2 , w , x , y ) agentive = make act( e1 , z , x ) 3 775 3 7777777777777775 In the GL representation, all of the participants which show up in the predicates in qualia are listed as default argument parameters in the argstr. In the analysis of English compounds presented below, the argstr provides an entry point for modi cation of the qualia of the head noun. 3.2. AGENTIVE QUALIA MODIFICATION Compounds such as bullet hole and lemon juice (1 c,d), in which the modi er relates to the origin or bringing about of the object described by the head noun, are treated as modi cation of the agentive role. In the case of lemon juice, the head juice will have a squeeze act as its agentive and the object squeezed will be listed as a default argument. The function of the modifying noun lemon is to further subtype this argument. This is possible because lemon is a subtype of fruit. The resulting representation for lemon juice is as in (18). (18) 2 666666664 lemon juice typestr = h arg1 = x liquid i argstr = h d-arg1 = y lemon i evenstr = h d-e1 = e1 transition i qualia = formal = x agentive = squeeze act( e1 , y , x )... 3 777777775 As mentioned earlier, the preposition used in Italian forms for speci cation of an argument in the agentive is di. 10 MICHAEL JOHNSTON AND FEDERICA BUSA 3.3. CONSTITUTIVE QUALIA MODIFICATION Another common function of modi ers in complex nominals is to specify a subpart of the denotation of the head noun or the material of which it is composed. Examples of this are given in (1 e,f). In our treatment, this involves modi cation of the constitutive role. The prepositions used in Italian for this sort of modi cation are a and al. The modi ers glass and silicon denote materials. When composed with nominals such as door and breast they specify elements of the constitutive role. For example, glass door is represented as in (19). (19) 2 6666666666666664 glass door typestr = arg1 = x phys obj arg2 = y aperture argstr = d-arg1 = w individual d-arg2 = z individual evenstr = " d-e1 = e1 transition d-e2 = e2 transition # qualia = 2 664 formal = hold( y , x ) constitutive = f x glassg telic = walk through act( e2 , w , y ) agentive = make act( e1 , z , x . y ) 3 775 3 7777777777777775 The basic pattern established so far is that modi cation of telic, agentive, and constitutive involves da, di, and a, respectively. This is a useful generalization but the correspondence between the di erent qualia roles and di erent choices of preposition in Italian is not as clear cut as this suggests. In the examples of telic qualia modi cation considered so far (1 a,b), the modifying noun was always of type individual. Matters become more complex when compounds in which the modifying noun describes an event are considered. We address these cases in Section 5. In the next section, we present our analysis of compound formation in English and Italian. 4. Analysis of Compound Constructions In order to account for the availability of compound forms in both English and Italian, we utilize phrase structure schemata. These schemata are essentially the same kind of entity as the Immediate Dominance Schemata employed in Head-driven Phrase Structure Grammar (Pollard and Sag 1994). They are schemata which license the availability of complex nominals, which we treat as phrasal signs. These schemata are essentially phrase structure rules. Compounds are licensed and interpreted as part of the process of parsing. A similar approach is described in Copestake and Lascarides 1997, The Compositional Interpretation of Nominal Compounds 11 where speci c types of compounds are treated as subtypes of a general noun-noun compound schemata. Our approach is similar in spirit, but differs in its adoption of qualia structure as the vehicle for expression of the semantic relationships found in compounds. The combination of words into compound forms could also be captured using lexical rules (Flickinger 1987, Pollard and Sag 1987). We have chosen to use phrase structure schemata rather than lexical rules on the basis of storage considerations. Each lexical rule used for compounds will license a great many modi ers for large number of potential heads. If the lexical rules are used at a pre-compilation stage in order to esh out the lexicon, allowing lexical rules for compounds will result in a massive increase in the size of the lexicon. For each noun, a huge number of compound forms will be generated. If you allow lexical rules for compounds to apply at runtime during the parsing process, then the storage problem is avoided, but then they are really not any di erent from phrase structure schemata. 4.1. THE ENGLISH COMPOUND CONSTRUCTION For English compounds, what we need to capture is that a modifying noun can specify the semantic type of one of the arguments in the qualia of the head noun. Since the indices of all the arguments in the qualia, other than that of the head noun, appear as default arguments, the range of modi cation possible can be captured by structure-sharing with positions in the argstr. The basic structure of the schemata licensing the combination of nouns to form noun compounds is as in (20). (20) modifier 1 orth = cat = N head 2 horth = cat = N i ) compound 2 664 orth = cat = N dtrs = " head = 2 mod = 1 # 3 775 To clarify their function, the schemata are shown as rules here. They are in fact encoded as single feature structures. The schemata di er with respect to the constraints placed on the content values and the way in which the content values of the head and the modi er are composed to generate the content for the compound as a whole. The availability of compound forms such as bread knife, where the modi er speci es an 12 MICHAEL JOHNSTON AND FEDERICA BUSA argument in the telic, is accounted for by the schema in (21). (21) modifier noun 1 "orth = cat = N content = 4 individual # head 2 "orth = cat = N content = 3 # ) compound 2 666664 orth = cat = N content = " 3 = argstr = hd-arg1 = 4 i # dtrs = " head = 2 mod = 1 # 3 777775 The content of the resulting compound is inherited from the head noun. In order to access the argument in the telic, the content value of the modi er is structure-shared with the rst default argument in the content of the head. The modifying noun must be of semantic type individual and its content value is structure-shared with the d-arg1 in the argstr of the resulting compound. The lexical representation of the compound also contains an attribute dtrs containing a head and a mod value. These are structure-shared with the lexical representations for the head noun and the modifying noun respectively. This schema is one of a number which are used to license this kind of modi cation of default arguments. There will also be schemata for modi cation of other default arguments. The fact that the content of the compound always comes from the head noun is captured by having all of the compound phrase structure schemata, which are themselves implemented as types, all inherit the constraint speci ed by the structure-sharing index 3 .For English, the same set of schemata which connect the modifying noun semantics with the default arguments can account for modi cation of di erent qualia. The kinds of qualia modi cation involved depends on which qualia the default argument in question is shared with. We turn now to the analysis of the Italian forms. 4.2. THE ITALIAN COMPOUND CONSTRUCTION Unlike English compounds, the equivalent constructions in Italian include an overt element, the preposition, which at least partially speci es the semantics of the modi cation relation between the elements of the compound. For these Italian constructions there are two possible ways to capture the combination of head and modi ying noun phrases. One approach is parallel to the approach to the English forms: phrase structure schemata are employed which require di erent prepositions depending on the nature The Compositional Interpretation of Nominal Compounds 13 of the semantic relation involved. The other approach is to encode the syntax and semantics of the modi cation relation directly in the preposition itself, and rely on more general phrase structure schemata for the syntactic combination. The former essentially involves addition of syntactic rules, while the latter involves addition of lexical entries for the prepositions. Given the facts outlined in Section 1, where `prepositions' such as da, di, and a are viewed as productive morphemes, we adopt the former approach, encoding the Italian complex nominal composition using phrase structure schemata. As an illustrative example, consider the preposition da which can be used to specify the type of one of the arguments in the telic role. We need to capture the fact that the sequence HEAD N, da, MODIFYING N can be interpreted as having the semantic content of the modifying N specify one of the arguments within the telic role of the head N. The schema for da compounds is as in (22). (22) modifier noun 1 "orth = cat = N content = 4 individual # head 2 "orth = cat = N content = 3 # da ) compound 2 666664 orth = da cat = N content = " 3 = qualia = htelic = R[.. 4 ..]i # dtrs = " head = 2 mod = 1 # 3 777775 The indeterminacy with respect to which argument in the telic is coindexed with the modi er in schema (22) is a shorthand representation. Either indeterminancy needs to be supported by the representation, or multiple lexical entries are required, each specifying linking to a di erent argument position in the telic. The Italian forms are accounted for by a schema like (22), except that the preposition is di and the linkage is to the agentive qualia role: (23) modifier noun 1 "orth = cat = N content = 4 individual # head 2 "orth = cat = N content = 3 # di ) compound 2 666664 orth = di cat = N content = " 3 = qualia = hagentive = R[.. 4 ..]i # dtrs = " head = 2 mod = 1 # 3 777775 14 MICHAEL JOHNSTON AND FEDERICA BUSA Cases of constitutive qualia modi cation, such as porta a vetri are handled with a parallel schema in which the preposition is a and the linkage is to the constitutive role. We turn now to consider cases in which the telic is modi ed by a noun describing an event. 5. Telic Event Modi ers In Section 3.3, above, we have hinted at the problem that the `preposition' da is not the only one associated with the telic role. In some forms where the modi er describes an event, the appropriate preposition in Italian is da, as in the forms in (24), while others the preposition is di, as in the forms in (25). (24) a. fucile da caccia b. macchina da corsa hunting ri e race car c. legno da intaglio carving wood (25) a. armi di distruzione b. carta di credito weapons of destruction credit card c. casa di riposo d. campo di concentramento rest home concentration camp e. procedura di divorzio divorce procedure In general, the telic use of the preposition di is consistently associated with modi ers denoting events. Even though this does not yet explain the di erence between (24) and (25), it already provides us with a restriction on the use of prepositions: da selects for any type, while di is restricted to events. Finer-grained distinctions in the use of prepositions are based on a closer analysis of events. We assume the Vendlerian distinction between activities, states, accomplishments, and achievements, and we adopt a decompositional view of event structure, as outlined in Pustejovsky (1991). In this framework, we can determine the selectional properties of di and da, on the basis of the event type of the modi ers. Nominals such as hunting, race, and carving describe activities. Nominals such as destruction, credit, and so on, in (25) above, describe the result of an activity. This distinction arises quite clearly in the glosses of (24) and (25). Compound forms such as hunting ri e or race car in (24), describe respectively an instrument which is used when hunting, and a vehicle that is driven for the purpose of racing. The Compositional Interpretation of Nominal Compounds 15 Conversely, the reading of the compounds in (25) makes explicit the result which is achieved by using a particular object. In particular (25a) refers to weapons that bring about destruction; (25b) to a card that brings about a credit, and so on. Unlike the operation which derives bread knife by associating the modi er to an argument position in the telic role of bread, the compositional operations which involve events produce a more complex structure. We argue that compounds where the modifying noun describes an event, such as those in (24), involve co-composition of the qualia structures of the head and the modi er. The resulting representation has a complex telic role with \sub-qualia". In the case of hunting ri e, the telic of ri e, which is re provides the agentive within the telic of the compound. The modi er hunting is a process nominal and provides hunt as the telic within the telic of the compound. Through the application of phrase structure schemata which constrain this co-composition, we obtain the representation in (26) for hunting ri e. (26) 2 666666666666666664 hunting ri e typestr = arg1 = x ri e argstr = d-arg1 = w human d-arg2 = z prey eventstr = 2 4d-e1 = eP1 process d-e2 = eP2 process 3 5 qualia = 2 6664 formal = x telic = 2 664 activity lcp telic = hunt( eP2 , w , z ) agentive = re( eP1 , w , x ) 3 775 3 7775 3 777777777777777775 The interpretation of the compound form hunting ri e can be glossed as follows:\a ri e which is used in its typical capacity (i.e. ring) for the purpose of performing the activity of hunting." The assignment of a complex structure to an individual quale is coherent with the general interpretation of qualia structure. Exploiting these recursive properties of event-denoting qualia is not an ad-hoc move to account for the interpretation of complex nominals. Rather, it provides a way of extending the expressive potential of qualia roles, and is independently motivated by the behavior of agentive nominals and their semantic contribution in context (cf. Busa 1996). The modifying noun in Italian complex nominals with the preposition di describes the result that is achieved by performing the particular function associated with the head noun. The nominal destruction, in (25a), 16 MICHAEL JOHNSTON AND FEDERICA BUSA unlike the event nouns hunting and race which denote activities, is the nominalization of the transitional event denoted by the verb destroy. The two subevents, namely the process and the resulting state, in the event structure representation of the verb, are encoded in the nominalized form as separate events in the agentive and formal roles, and they are related by the relation of temporal precedence to note, however, that not all Italian complex nominals involving post-modi cation can be translated as noun-noun compounds in English. Forexample, forms such as coltello da macellaio (literally, knife of butcher),in which the modi er is an agent using the object described by the head,does not translate as butcher knife. In English, the appropriate nominalconstruction in this case uses the possessive: butcher's knife.Translation from English to Italian is substantially more di cult giventhe di erence in explicitness regarding the semantic relation between thehead and modi er. In order to generate the proper output in Italian, it isnecessary to determine the relation between the elements in the Englishcompound structure and to determine the appropriate preposition in Ital-ian for expression of that relation. One approach to this task is to use theGL representation language essentially as an interlingua (McDonald 1995).The phrase structure schemata for English are used in order to determinepotential interpretations for a given English compound construction. Themost likely interpretation from the candidate set is picked on the basis ofcontextual and statistical models. The content of the chosen candidate isthen matched against the outputs of the various phrase structure schemataused for Italian. When an appropriate schema is identi ed it is instanti-ated with lexical items from the Italian lexicon in order to generate theItalian translation. An important feature of this approach is that it utilizesresources which are independently needed for analysis of the languages in-volved. Aside from translation, the phrase structure schemata can also beused for multi-lingual generation. If a particular concept is encoded in theGL lexical representation language, the language-speci c phrase structureschemata can be employed to generate the corresponding complex nominalin each language.In addition to the importance of successful translation of complex nomi-nals for full-text machine translation, this functionality is useful in itself forapplications in multi-lingual information retrieval and information extrac-tion. Since complex nominals are so frequently used to coin terms whichencapsulate important distinguished concepts within a domain, their suc-cessful identi cation and processing is an essential element of determinationof the topic of a text and they provide important hooks for information re-trieval. In a multi-lingual setting, such as information retrieval over theWorld Wide Web, it may be desirable for a search for a complex nom-inal from one language to yield documents regarding the same conceptin other languages. The approach to translation of complex nominals de-scribed above enables this functionality. For a given form compound formin English it is possible to determine potential realizations of that form inItalian. 20MICHAEL JOHNSTON AND FEDERICA BUSA7. ConclusionIn this paper, we have shown how the theory of qualia structure within theGenerative Lexicon, provides a representational framework for a composi-tional treatment of compounds. In compounds where the modifying noundescribes an individual, in composition, the modi er further speci es thetype of an argument to a predicate in the telic, agentive, or consti-tutive role. In Italian, the canonical prepositions for these three kindsof modi cation are da, di, and a, respectively. In compounds where themodifying noun denotes an event, the composition in the compound fre-quently involves co-composition between the qualia structure of the headand modi er. In Italian, for telic modi cation the preposition is da whenthe modi er describes an activity and di when the modi er describes aresult.In addition to its theoretical relevance, the approach to the semanticsof complex nominals described here has important applications in the con-struction of natural language processing systems. In particular, it providesthe foundations for machine translation of complex nominals between En-glish and Italian and can be readily applied in multi-lingual generation andmulti-lingual information extraction.ReferencesAlshawi, Hiyan.1987. Memory and Context for Language Interpretation. Studies in Nat-ural Language Processing. Cambridge University Press, Cambridge, England.Beard, Robert. 1996. Head Operations and Head-Modi er Ordering in Nominal Com-pounds. Presentation at 1996 Linguistic Society of America Meeting, San Diego, Califor-nia. Bergsten, N. 1991. A Study on Compound Substantives in English. Almquist andWiksell, Uppsala.Bouillon, Pierette. 1995. The Semantics of Adjectival Modi cation.ms. ISSCO, Geneva.Bouillon, P, K. Bosefeldt, and Graham Russell. 1992. Compound Nouns in a Uni cation-Based MT System. In Proceedings of the Third Conference on Applied Natural LanguageProcessing (p209-215). Trento, Italy.Busa, Federica. 1996. Compositionality and the Semantics of Nominals. DoctoralDissertation. Brandeis University.Carpenter, R. 1990. Typed feature structures: Inheritance, (In)equality, and Exten-sionality. In W. Daelemans and G. Gazdar (Eds.), Proceedings of the ITK Workshop:Inheritance in Natural Language Processing, Tilburg. Institute for Language Technologyand Arti cial Intelligence, Tilburg University, pp. 9-18.Carpenter, R. 1992. The logic of typed feature structures. Cambridge UniversityPress, Cambridge.Copestake, Ann.1995. Representing Lexical Polysemy.Working Notes of AAAI SpringSymposium on the Representation and Acquisition of Lexical Knowledge, Stanford Uni-versity, Palo Alto, California.Copestake, Ann., and Ted Briscoe. 1995. Semi-productive Polysemy and Sense Ex-tension. Journal of Semantics 12. The Compositional Interpretation of Nominal Compounds 21Copestake, Ann., and Alex Lascarides. 1997. Integrating Symbolic and StatisticalRepresentations: The Lexicon Pragmatics Interface. Proceedings of 35th Annual Meetingof the Association for Computational Linguistics. ACL Press, New Jersey.Downing, P. 1977. On the Creation and Use of English Compound Nouns. Language53. 810-842.Finin, Timothy. W. 1980. The Semantic Interpretation of Compound Nominals. Doc-toral Dissertation. University of Illinois at Urbana-Champaign.Flickinger, Daniel. 1987. Lexical Rules in the Hierarchical Lexicon. Doctoral Disser-tation. Stanford University.Hobbs, Jerry R., Martin. E. Stickel, Douglas E. Appelt, and Paul Martin. 1993.Interpretation as Abduction. In Fernando C.N. Pereira and Barbara Grosz (eds.) NaturalLanguage Processing. MIT Press, Cambridge, Massachusetts.Isabelle, P. 1984. Another Look at Nominal Compounds. In Proceedings of the 10thInternational Conference on Computational Linguistics and the 22nd Meeting of the ACL.(pp. 509-516).Jespersen, Otto. 1942. A Modern English Grammar on Historical Principles, IV.Munksgaard, Copenhagen.Jones, Bernard. 1995. Nominal Compounds and Lexical Rules. Working Notes of theAcquilex Workshop on Lexical Rules. Cambridge, England, August 1995.Johnston, Michael, Branimir Boguraev, and James Pustejovsky. 1995. The Acquisi-tion and Interpretation of Complex Nominals.Working Notes of AAAI Spring Symposiumon the Representation and Acquisition of Lexical Knowledge, Stanford University, PaloAlto, California.Lees, Robert. 1970. Problems in the Grammatical Analysis of English Nominal Com-pounds. In Bierwisch and Heidolph (eds.) Progress in Linguistics. Mouton, The Hague.Levi, Judith N. 1978. The Syntax and Semantics of Complex Nominals. AcademicPress, New York.Marchand, Hans. 1969. The Categories and Types of Present Day English Word For-mation. C.H Becksche, Munich.McDonald, David. 1995. Lexical Discontinuities in the Functional Meaning of Words.Working Notes of Multilingual Text Generation Workshop. IJCAI, August 20-21, Mon-treal, Quebec.McDonald, David B. 1982. Understanding Noun Compounds. CMU Technical ReportCS-82-102.Pollard, Carl and Ivan Sag. 1987. Information-based Syntax and Semantics, Volume1: Fundamentals. CSLI Lecture Notes Series No.13. Centre for the Study of Languageand Information. Stanford University.Pollard, Carl and Ivan Sag. 1994. Head-driven Phrase Structure Grammar. Uni-versity of Chicago Press. Chicago. Pustejovsky, James. 1991. The Generative Lexicon.Computational Linguistics. 17.4.Pustejovsky, James. 1995. The Generative Lexicon. MIT Press, Cambridge, Mas-sachusetts.Warren, Beatrice, 1987. Semantic Patterns of Noun-Noun Compounds. GothenburgStudies in English 41. Acta Universitatis Gothoburgensis, Gothenburg.
منابع مشابه
Qualia Structure And The Compositional Interpretation Of Compounds
The analysis of nominal compound constructions has proven to be a recalcitrant problem for linguistic semantics and poses serious challenges for natural language processing systems. We argue for a compositional treatment of compound constructions which limits the need for listing of compounds in the lexicon. We argue that the development of a practical model of compound interpretation crucially...
متن کاملAn Ontology-Based Method for Extracting and Classifying Domain-Specific Compositional Nominal Compounds
In this paper, we present our preliminary study on an ontology-based method to extract and classify compositional nominal compounds in specific domains of knowledge. This method is based on the assumption that, applying a conceptual model to represent knowledge domain, it is possible to improve the extraction and classification of lexicon occurrences for that domain in a semi-automatic way. We ...
متن کاملOn the Compositionality and Semantic Interpretation of English Noun Compounds
In this paper we present a study covering the creation of compositional distributional representations for English noun compounds (e.g. computer science) using two compositional models proposed in the literature. The compositional representations are first evaluated based on their similarity to the corresponding corpus-learned representations and then on the task of automatic classification of ...
متن کاملHybrid Approach for the Interpretation of Nominal Compounds using Ontology
Understanding and interpretation of nominal compounds has been a long-standing area of interest in NLP research for various reasons. (1) Nominal compounds occur frequently in most languages. (2) Compounding is an extremely productive word formation phenomenon. (3) Compounds contain implicit semantic relations between their constituent nouns. Most approaches that have been proposed so far concen...
متن کاملLinguistic Issues in Language Technology – LiLT Nominal Compound Interpretation by Intelligent Agents
This paper presents a cognitively-inspired algorithm for the semantic analysis of nominal compounds by intelligent agents. The agents, modeled within the OntoAgent environment, are tasked to compute a full context-sensitive semantic interpretation of each compound using a battery of engines that rely on a high-quality computational lexicon and ontology. Rather than being treated as an isolated ...
متن کاملThe Semantic Interpretation of Nominal Compounds
This paper briefly introduces an approach to the problem of building semantic interpretations of nominal ComDounds, i.e. sequences of two or more nouns related through modification. Examples of the kinds of nominal compounds dealt with are: "engine repairs", "aircraft flight arrival", ~aluminum water pump", and "noun noun modification".
متن کامل